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Abstract 


This paper considers two-sided tests for the parameter of an endogenous variable 
in an instrumental variable (IV) model with heteroskedastic and autocorrelated er¬ 
rors. We develop the hnite-sample theory of weighted-average power (WAP) tests 
with normal errors and a known long-run variance. We introduce two weights which 
are invariant to orthogonal transformations of the instruments; e.g., changing the 
order in which the instruments appear. While tests using the MMl weight can be 
severely biased, optimal tests based on the MM2 weight are naturally two-sided when 
errors are homoskedastic. 

We propose two boundary conditions that yield two-sided tests whether errors are 
homoskedastic or not. The locally unbiased (LU) condition is related to the power 
around the null hypothesis and is a weaker requirement than unbiasedness. The 
strongly unbiased (SU) condition is more restrictive than LU, but the associated WAP 
tests are easier to implement. Several tests are SU in hnite samples or asymptotically, 
including tests robust to weak IV (such as the Anderson-Rubin, score, conditional 


quasi-likelihood ratio, and I. Andrews’ (2015) PI-CLC tests) and two-sided tests 


which are optimal when the sample size is large and instruments are strong. 

We refer to the WAP-SU tests based on our weights as MMl-SU and MM2-SU 
tests. Dropping the restrictive assumptions of normality and known variance, the 
theory is shown to remain valid at the cost of asymptotic approximations. The 
MM2-SU test is optimal under the strong IV asymptotics, and outperforms other 
existing tests under the weak IV asymptotics. 




1 Introduction 


In an instrumental variable (IV) model, researchers often rely on asymptotic ap¬ 
proximations when making inference on the structural coefficients. These approx¬ 
imations, however, can be poor when instruments are weakly correlated with the 


endogenous regressors as explained by Nelson and Startz (1990), Bound, Jaeger, and 


Baker (1995), Dufour (1997), and Staiger and Stock (1997). The goal is to hnd 


reliable econometric methods regardless of how strong the instruments are. 

There has been some progress in the IV model with one endogenous variable and 
k instruments when errors are homoskedastic. Anderson and Rubin (1949) propose a 
test statistic which has an asymptotic chi-square-A; distribution regardless of how weak 
the instruments are. Moreira (2001, 2009) shows that the Anderson-Rubin statistic 
is optimal in the just-identihed model, but points out potential power gains when 
there exists more than one instrument. Kleibergen (2002) and Moreira (2002) show 
that a score (LM) test statistic has a standard chi-square-one distribution whether 
the instruments are weak or not. Moreira (2003) proposes to replace the critical value 
number by conditional quantiles of test statistics. These conditional tests are similar 
by construction, hence have correct size. He applies the conditional method to the 


likelihood ratio (LR) statistic and the two-sided Wald statistic. Andrews, Moreira, 


[and Sto^ (2006a) (hereinafter, AMS06) show that the conditional likelihood ratio 
(CLR) test satishes natural orthogonal invariance conditions and is nearly optimal. 
[Andrews, Moreira, and Stock (2007) hnd that conditional Wald (CW) tests, however. 


have poor behavior and object to their use in empirical work. Mills, Moreira, and 


Vilela (2014a) show that the bad performance of CW tests is due to the asymmetric 
distribution of one-sided Wald statistics when instruments are weak. By extending 


Moreira s (2003) conditional approach, they hnd approximately unbiased Wald tests 


whose power is comparable to the CLR test. 

While use of the IV model with homoskedastic errors was important to advance 
the literature on weak identihcation, the IV model with heteroskedastic and autocor- 
related (HAC) errors is considerably more relevant for applied researchers. Some of 
the theoretical hndings for homoskedastic errors are easily extended for more com¬ 


plicated stochastic processes, whereas others are not. Important work by Stock and 


Wright 

(2000) 

Guggenberger and Smith 

(2005) 

Kleibergen 

(2006) 

Otsu 

(2006 

), and 

Andrews and Mikusheva 

(2015 

), among others, extends the tests conceived for the 


simple homoskedastic IV model to the generalized method of moments (GMM) and 
generalized empirical likelihood (GEL) frameworks. Their tests are of course applica¬ 
ble to the HAC-IV model, but it is unknown whether these adaptations are optimal. 
The purpose of this paper is exactly this: to develop a theory of optimal two-sided 
tests for the HAC-IV model. 

We are able to hnd a statistic that is pivotal and independent of a second statistic, 
which is sufficient and complete for the instruments’ coefficients under the null. We 
show that the invariance argument of AMS06 for homoskedastic errors is only appli- 
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cable if a (long-run) variance has a Kronecker product structure. This limitation has 
profound consequences for the behavior of weighted-average power (WAP) tests. We 
choose two priors for the structural parameter and the instruments’ coefficients and 
denote the associated test statistics MMl and MM2. The priors are chosen to illus¬ 
trate the effect of a poor weight choice on the power of WAP tests. Although priors 
vanish asymptotically as in the Bernstein-von Mises theorem, the associated tests can 
behave quite differently in hnite samples (or under the weak-instrument asymptotics). 
When a variance matrix has a Kronecker product structure, both test statistics are 
orthogonally invariant, but only MM2 satishes an additional sign invariance argument 
that preserves the two-sided hypothesis testing problem. As a consequence, a WAP 
similar test based on the MMl statistic can behave as a one-sided test and have poor 
power even with homoskedastic errors (this problem is analogous to the conditional 
Wald tests documented by Andrews, Moreira, and Stock ( 2007[ )) while the WAP sim¬ 
ilar test using the MM2 statistic has overall good power with a Kronecker-product 
variance matrix. Other weight choices face the same difficulties as the MMl statistic 
for the HAC-IV model, including the recently proposed WAP similar test by 
dM^ , denoted ECS (HAC-IV). 

When the (long-run) variance matrix does not have a Kronecker product represen¬ 
tation and the model is identihed, the Anderson-Rubin test (among other equivalent 
tests) is the uniformly most powerful unbiased test. In the over-identihed model, we 
show theoretically that it is possible to hnd a weight so that the test is approximately 
unbiased and admissible. The lack of invariance, however, makes it harder to con¬ 
struct such weights. In practice, we endogeneize this search by imposing in the WAP 
maximization problem a boundary condition based on the local power around the 
null hypothesis. This locally unbiased (LU) condition is a weaker requirement than 
unbiasedness, so it does not rule out admissibility. The WAP-LU tests are found with 
non-linear algorithms, which makes it difficult to implement them. We then propose 
a stronger requirement than LU, denoted the strongly unbiased (SU) condition. The 
resulting class of tests includes several two-sided tests robust to weak IV, including 
the Anderson-Rubin, score, (pseudo) likelihood ratio tests by Kleibergen (2006) and 
Andrews and Guggenberger (2014b), and I. Andrews' (2015) PI-CLC tests. Two¬ 


sided optimal tests also satisfy the SU condition asymptotically when the sample 
size is large and instruments are strong. The WAP-SU tests have power close to 
the WAP-LU tests based on the MMl and MM2 weights, with the advantage being 
that the WAP-SU tests are easy to implement with a standard linear programming 
software package. We refer to the WAP-SU tests based on our weights as MMl-SU 
and MM2-SU tests. 


We follow 1. Andrews (2015) and implement numerical simulations based on Yogo 


(2004). We choose, however, Yogo’s (2004) design where the endogenous variable is 
the real stock return and the instruments are genuinely weak. We hnd that, as our 
theory predicts, the WAP similar tests can be quite erratic. In some designs, they 
behave as usual two-sided tests and have good power. In other designs they behave 
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as one-sided tests and have power near zero. We do not recommend the MMl and 
MM2 similar tests for empirical researchers. The MM2-SU test, however, outperforms 
other tests (including the MMl-SU test) and when it occasionally has less power than 
competing tests, the power loss is small. We recommend the use of the MM2-SU test 
in empirical work. Our asymptotic analysis is quite general and encompasses all WAP 
similar and WAP-SU tests whose weight does not depend strongly on the sample size. 

The remainder of this paper is organized as follows. Section [^introduces the HAC- 
IV model and presents the test statistics, including the MMl and MM2 statistics. 
Sections [^ and [^ discuss the power maximization problem and the WAP-LU and 
WAP-SU tests. Section [^presents power curves and the role of LU and SU conditions 
in obtaining WAP tests with overall good power. Section [^ develops an asymptotic 
framework that encompasses the weak IV and strong IV asymptotics. Section 
revisits the work of I. Andrews (2015) and Yogo (2004) on testing the intertemporal 
rate of substitution, with one important modihcation. Section contains concluding 
remarks. All proofs are given in the appendices. 


2 The IV Model and Statistics 

Consider the instrumental variable model 


yi = 1/2/3 + M 
y2 = Z-K + U2, 


where yi and y2 are n x 1 vectors of observations on two endogenous variables, Z 
is an n X fc matrix of nonrandom exogenous variables having full column rank, and 
u and V 2 are n x 1 unobserved disturbance vectors having mean zero. The goal 
here is to test the null hypothesis Hq : /3 = /3q against the alternative hypothesis 
Hi : (3 ^ /3q, treating vr as a nuisance parameter. We do not not include covariates in 
this model, but we note that can be easily handled by the usual projection arguments; 
see AMS06. 

We look at the reduced-form model for Y = [yi,y 2 ]- 


Y = Z7ia + V, 


( 2 . 1 ) 


where a = 1/3,1)' and V = [^ 1 ,^ 2 ] = [u + V2(3,V2\ is the nx 2 matrix of reduced- 
form errors. We allow the errors to be heteroskedastic and autocorrelated. Let 
Pi = Z {Z'Z)~^^'^ and let [Pi, P 2 ] G On, t he group of n x n orthogonal matrices. Pre¬ 


multiplying the reduced-form model (2.1) by [Pi,P 2 ]^, we obtain the pair of statistics 
P[Y and P^Y. In this section, we assume that Z'V is normally distributed 

with known variance matrix S (this assumption will be relaxed later at the cost 
of asymptotic approximations). The statistic PgY is ancillary and we do not have 
previous knowledge about the correlation structure on V. In consequence, we consider 
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tests based on i? = 

R = ^^a + {Z'Z'V, 

where [i = 

It is convenient to find the one-to-one transformation of P given by the pair 


^ = [(&o®4)S(6o®4r'/'(&o®4)i?and (2.2) 

T = [(a'o ®4)S-i(ao®4)]"'^'(a'o®4)S-'P, 


where P = vec 


{Z'Z) 


- 1/2 


Z'Y , Go = (/3 q, 1)^ and 4 = (1, ■ The pair S and T 

have three important properties: (i) they are independent; (ii) S is pivotal; and (in) 
T is complete and sufficient for jj, under the null. More specifically, the statistics S 
and T have distribution 


S ~ N [{/3 - /3q) and T ~ iV 4), where (2.3) 

C/3o = [(6o ® 4) S (4 (g) 4)]“^^^ and 
P/3 = [(a'o (g 4) (ao (g 4)] (a'o ® 4) (a g 4) • 

The joint density 4,^ (s, t) is given by 


4 ,/. {s, t) = {2pi) 


-k/2 


exp 

T 


s-{(3- /^o) 


X {2pi 


,-k/2 


exp 


\\t-Dipx\( 


= fp,!, (s) X //3,/. it) , 


where pi = 3.1415... and (s) and (t) are the marginal densities for S and T. 

Examples of test statistics based on S and T are the Anderson-Rubin (AR), the 
score or Lagrange multiplier (LM), and the quasi likelihood ratio (LR) statistics. 
Anderson and Rubin (1949) propose to use a pivotal statistic. In our model the 
Anderson-Rubin statistic is given by 


AR = S'S. (2.4) 

In Appendix A, we derive the LM and LR statistics under that the assumption the 
errors are normal. For any full column rank matrix X, let Nx = X [X'X)~^ X' and 
Mx = I — Nx- Then the LM statistic simplifies to 

LM = (2.5) 

The likelihood ratio statistic is given by 

LR = - T'T. (2.6) 
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The LR statistic is apparently not a simple function of S and T (which makes it 


difficult to implement the test coupled with conditional critical values). Kleibergen 


(2006) instead adapts the formula for the likelihood ratio statistic derived by Moreira 


(2003) in the homoskedastic IV model to the GMM framework. For the HAC-IV 
model, this quasi likelihood ratio statistic becomes 


QLR = 


AR-r (T) + y (AR - r (T) f + AIM ■ r (T) 


( 2 . 7 ) 


where AR and LM are dehned in (2.4) and (2.5), and r iT) = T'T. Andrews and 


Guggenberger (2014b) use a Kronecker product G (8) <F (where G and $ are positive- 
definite matrices respectively with dimensions 2x2 and k x k) approximation to the 
variance S; see [Van Loan and Ptsianis (1993) for more details on Kronecker product 
approximations. 

We now present two novel WAP statistics based on the weighted-average density 


liA (s,t) = / fii,A(s,t) 


( 2 . 8 ) 


These weight functions use the Kronecker product G (g) <F approximation to S with 
the Frobenius norm (i.e., the norm of a matrix X is given by ||X|| = [X'X)). 

For the MMl statistic hi (s, t), we choose A (/d, /i) to be N (/dg, 1) x N (0, cr^<F). For 
the MM2 statistic h 2 (s,t), we first define the identity tan {6) = d/ 3 ( 6 »)/c/ 3 ( 6 i), where 


cp = {(3 — /dg) ■ (feghlfog) and dp = a'Q ^Og ■ (ogG ^Og) 


(2.9) 


We choose A (/3, /i) so that the prior for 6 and /i are Unif[—pi,pi]xN (^0, ||^/ 3 ( 6 »)|| ^ C ' i 
where Ip = (cp, dp)'. 

In Appendix A, we show that the MMl and MM2 statistics are 


hi (s, t) = {2pi) ^ 

h2{s,t) = {2pi)~^^^^^ 

where the matrix 'k/ 3 ,o -2 is given by 



(s', t')' + (/^ - /^o)^ A 

GXp 1 

2 j 


j>pi 

— 


II II —2 

d(^'):||^/3{e)|| C 


- 1/2 


exp 


(S',F)^“' n n-2 {s'R'y 



= I 2 


Ih + O' 


{(3-^,fCp^^Cp^ {^-P,)Cp^^D'p 

[ (/? - /?g) Dp^Cp^ Dp^D'p 


( 2 . 11 ) 
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2.1 Kronecker Variance Matrix 


We consider here the special case where S = hi 0 $ exactly. This framework is 
particularly interesting for two reasons. First, it encompasses the homoskedastic case 
by taking $ to be the identity matrix. We will show that the S and T statistics for 


general error structure simplify to the original statistics of Moreira (2001,2009) for the 


homoskedastic model. Second, the model where S has a Kronecker product structure 
enjoys natural invariance properties. Some statistics are invariant but others are 
not. This has profound consequences for testing procedures based on these statistics. 
Indeed, typical tests based on noninvariant statistics (such as those using a constant 
or Moreira s (2003) conditional critical value function) behave as one-sided tests for 


parts of the parameter space. We will illustrate this problem numerically in Section 

El 


When S = n ® $, the statistics S and T defined in (|2.2) simplify to 


5 

T 


$ 

$ 


■ ih'yiho)-^/^ and 


( 2 . 12 ) 


Their distribution is given by 

S~N (c;54-‘''V. 4) and r ~ iV (da4"‘''V. 4) • 


(2.13) 


AMS06 use invariance arguments for the special case $ = J^. However, the parameter 
jjLq, = is unknown because jj, is unknown. Hence, AMSOO’s invariance argument 

applies to the new parameter Specifically, let g & On and consider the 

transformation in the sample space 

g o (A, T) = {gS, gT ). 

The induced transformation in the parameter space is 

Invariant tests depend on the data only through 


Q = 


Qs 

Qst 


■ S'S 

S'T ' 

_ Qst 

Qt 


S'T 

T'T 


(2.14) 


The density of Q at g for the parameters (3 and A = tt'(Z'Z)^'^^ $ ^ (Z'Z)^'^^7r is 
given by 

fls,\iqs,qsT,qT) = A:oexp(-A(c|-l-d|)/2) 

X exp(-(gs + qT)/2){\y{q))-^^-^^/y^k_2)/2{^Xy{q)), 


6 

















where r(.) is the gamma function, I[k- 2 )/ 2 {') denotes the 

modihed Bessel function of the hrst kind, and 

ipio) = + 2ci3di3qsT + di^pqr- (2-15) 

The following proposition shows that the WAP densities hi [s, t) and h .2 (s, t) are 
invariant when the covariance matrix is a Kronecker product. Indeed, the Kronecker 
product approximation hi ® $ to S in the dehnition of the weights was chosen exactly 
to guarantee the test statistics are orthogonal invariant. 

AMS06 show there also exists a sign transformation that preserves the two-sided 
hypothesis testing problem. Consider the group (Pi, which contains only two elements: 
^ G {—1,1}. The group transformation in the sample is 

g o (Qs, Qst, Qt) = (Qs, 9 ■ Qst, Qt) , 


whose maximal invariant is Qs, \Qst\, and Qt- This group yields a transformation 
in the parameter space. For g = —1, AMS06 show that this transformation is 


go{p,X) = Mg- 


d/3o(/5-/5o) i(d/3o+ 2j/3o(/?-/3o))^ 


d/3o + ‘2jpo{/3 - /3o) 


,A 


A 


, where 


2/3o = 1-1/9 and ei = (1, 0) . 


(2.16) 


(a'fl-iao)-V2 

(by the dehnition of a group, the parameter remains unaltered at ^ = 1). The 
transformation in (2.16) hips the sign oi (3 — for (3 ^ dehned as 

ujii — Ci;i2/3q 


I^AR — 


^12 — ^2213, 


° where fl = \uj. 


i,i\ 


(2.17) 


So the sign transformation preserves the two-sided hypothesis testing problem Hq : 
(3 = (3q against Hi : (3 (3 q, but not the one-sided, e.g., testing Hq : (3 < (3q against 

Hi-./3> (3,. 


Proposition 1. The following holds when E = hi ® $.• 

(i) The weighted-average densities hi{s,t) and h 2 {s,t) are invariant to orthogonal 
transformations. That is, they depend on the data only through Q; and 

(ii) The weighted-average density ^2 (s, t) is invariant to sign transformations. It de¬ 
pends on the data only through Qs, \Qst\, nnd Qt- 


The MMl statistic is not sign invariant. We can create a weighted-average statistic 
that is sign invariant by replacing the weight in hi = J fp^^x {qs, qsT, qr) dKi {(3, A) 
by 

A (/3, A) = A. (ftA) + Ai (g°(ftA)) ^ 
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for g = —1. We note that 

/ f^,x{Qs,qsT,qT) dA{(3,X) = 



f^Aqs, qsT, qr) dAi {g o (/3, A)) v {dg ), 


where v is the Haar probability measure on the group Oi\ = z/({ —1}) = 1/2. 

Because 

J fpAqs,-qsT,qT) dA{(3,X) = J f(^^i)o(i3,\}{qs,qsT,qT) dA{(3, X) 

= / fisAqs, qsT, qr) dA (/3, A), 


the weighted-average statistic based on (2.18) only depends on gs, {qsrl ,qT- But the 


MM2 statistic is already sign invariant for having chosen a clever prior for fd and fi. 
In fact, the MM2 prior was chosen so that the hnal statistic is sign invariant. Tests 
based on /i 2 (s, t) are naturally two-sided tests for the null Hq : (3 = /3q against the 
alternative Hi ■. fd ^ when E = ® $. This important property does not hold 

for standard tests based on hi {s,t). The WAP test (denoted ECS-HACIV) proposed 
recently by Olea (2015) is not sign invariant either. Sections|^and[^present numerical 
simulations showing that all these WAP similar tests can behave like one-sided tests 
for some parameter values. In the next section, we will discuss ways to circumvent 
this problem whether S has a Kronecker product structure or not. 


3 Weighted-Average Power Tests 

So far, we have only described test statistics. Coupled with critical values, we obtain 
the test procedures commonly used in the literature. The Anderson-Rubin test rejects 
the null when AR > c{k), where c (d) is the 1 — a quantile of a chi-square distribution 
with d degrees of freedom. The LM test rejects the null when LM > c(l). The 
conditional tests reject the null when each test statistic 'ijj{S,T) > k{T). Each 
critical value function k (T) is the null conditional quantile of "0 given T = t] see 


Moreira (2003) for details (we omit the dependence of the critical value function on 
the statistic 0 when there is no ambiguity). For example, the CQLR test rejects the 
null when the QLR statistic dehned in ( |2.7 ) is larger than the conditional critical 
value. 

Our goal in this section is to hnd optimal tests. Specihcally, a test is dehned to 
be a measurable function 0 (s,t) that is bounded by 0 and 1. For a given outcome, 
the test rejects the null with probability 0 (s, t) and accepts the null with probability 
1 — 0 (s, t), e.g., the Anderson-Rubin test is simply / {AR > c {k)) where / (■) is the 
indicator function. The test is said to be nonrandomized if 0 only takes values 0 and 
1; otherwise, it is called a randomized test. We note that 


Ep^^(j){S,T) = / (j){s,t) fg,f,{s,t) d{s,t) 
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is the probability of rejecting the null when the parameters are /? and p. The object 
-E'/3,At0 {S, T) taken as a function of /3 and /i gives the power curve for the test 0. In 
particular, (S', T) gives the null rejection probability. By Tonelli’s theorem, we 

can write 


EA(f){S,T)= / Efj^^(j){s,t)dA{/3,fi)= / 0(s,t )/ia (s,t) d{s,t), 


(3.19) 


where hA{s,t) is dehned in (2.8). Hence, EA(j){S,T) is the weighted-average power 
for the measure A (/?, p). 

A natural hrst step is to hnd tests that maximize WAP and have size no larger 
than a. That is. 


max Ea(P (S', T), where (S', T) < a, Vp. 


(3.20) 


Since the parameter /i is unknown, hnding a WAP test with correct size is nontrivial. 
The task entails hnding a least favorable distribution Aq to construct the WAP test 
as described in Section 3.8 of Lehmann and Romano (2005). This test rejects the 


null when the likelihood ratio is large: 


hh (s,t) 




= -> 


(3.21) 


where k-A is really a Lagrange multiplier in an inhnite-dimensional space; see Lemma 
3 of Moreira and Moreira (2010) for detail^ For a parameter p of small dimension, 
we can apply numerical algorithms to approximate the WAP test (such as the one by 
Elliott, Mueller, and Watson ( |2015[ ) or the linear programming algorithm of Moreira 
and Moreira (2013)). 

The task of hnding tests with correct size is simplihed if we can hnd optimal 
similar tests: 

max E^fp {S, T ), where E^^^^cp (S', T) = a, V/i. (3.22) 

Because the statistic T is sufficient and complete under the null, any similar test is 
conditionally similar (for almost all levels T = t). Hence, we can solve 


max E^ip (S, t ), where Ej^^cp (S', t) = a. 


The WAP similar test rejects the null when 

(3. 


^Also available as Lemma 2 in the most recent version, |Moreira and Moreira|(|2013|). Both versions 
are available on Marcelo Moreira’s website: http://www.fgv.br/professor/mjmoreira/ 



/lA (s, t) 


ha 


(s) ■ hi (t) 


> K (t) , 
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where k {t) is a conditional critical value function and (t) = f hji (s, t) ds. By 
Tonelli’s theorem, 


hl{t) = j J dA{f3,fx) ds 

= j j dA (/?, fi) 

= I dA{P,f,). 


For arbitrary weights A, neither the WAP test with correct size nor the WAP 
similar test is guaranteed to have overall good power in hnite sample^ Take for 
a moment the case where S = hi ® <F. The WAP tests based on hi (s, t) can have 
very low power for some parameter values. Because the WAP test with correct size 
and the WAP similar test based on the MMl weight are not sign invariant, they can 
actually behave like one-sided tests for parts of the parameter space. 

This issue is analogous to the problem with conditional Wald tests found by 


Andrews, Moreira, and Stock (2007) which leads them to give a very specific rec¬ 


ommendation: “The evident conclusion for applied work is that researchers choosing 
among these tests (including conditional Wald) should use the CLR test. The strong 
asymptotic bias and often low power of the conditional Wald tests indicate that they 
can yield misleading inferences and are not useful, even as robustness checks.^' For 
our purposes we can of course circumvent this problem by replacing hi (s, t) by a 
sign invariant weight given by (2.18) or by the density h .2 (s,f). However, this solu¬ 
tion relies on model symmetries (i.e., sign invariance) and only works for Kronecker 
covariance matrices. 

On the other hand. Mills, Moreira, and Vilela (2014a) hnd approximately unbiased 
Wald tests which have overall good power. Their procedure only works for the model 
with homoskedastic errors, but it does hint that imposing additional constraints can 
actually help to obtain optimal tests with overall good power for general S. 


4 Two-Sided Boundary Conditions 


The WAP similar test based on ^2 (s, t) is a two-sided test in the homoskedastic case 
precisely because the sign-group of transformations preserves the two-sided testing 
problem when S = H (g) <F. More specifically, because this test depends only on 
Qs) \Qst\, and Qt it is locally unbiased; see Corollary 1 of Andrews, Moreira, and 


Stock (2006b). When errors are autocorrelated and heteroskedastic, however, the 


^As the geneticist and statistician Anthony W. F. Edwards (1992 p. 60) remarks, “It is sometimes 


said, in defence of the Bayesian concept, that the choice of prior distribution is unimportant in 
practice, because it hardly influences the posterior distribution at all when there are moderate 
amounts of data. The less said about this ‘defence’ the better.” 
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covariance S typically does not have a Kronecker product structure. In this case, the 
WAP similar test (or a WAP test with correct size) based on h 2 {s, t) may not have 
good power for parts of the parameter space. Worse yet, when the covariance matrix 
lacks Kronecker product structure, there is actually no sign invariance argument to 
accommodate two-sided testing. 


Proposition 2. Assume that we cannot write S as ® <h for a 2 x 2 matrix Vt and 
a k X k matrix both symmetric and positive definite. Then for the data group 
of transformations [S', T] —)■ [ills', T], there exists no group of transformations in the 
parameter space which preserves the testing problem. 


Proposition asserts that we cannot simplify the two-sided hypothesis testing 
problem using sign invariance arguments. It is then much more difficult to find a 
weight so that the test is, loosely speaking, two-sided. An unbiasedness condition 
instead adjusts the weights automatically (whether S has a Kronecker product or 
not). Hence, we can seek approximately optimal unbiased tests. 

An important property of WAP tests is admissibility. Theorembelow shows that 
the WAP unbiased tests are admissible. The proof follows exactly the same steps as 


see 


the proof for admissibility of WAP similar tests of Moreira and Moreira (2013) 
Comment 1 after their Theorem 4)[^ For completeness, we provide a proof in the 
appendix for the following theorem. 


Theorem 1. Let G B x P, where both sets compact. Assume that the weight A 

appearing in (2.8) has full support on B x P. Then there exists a seguence of Bayes’ 
tests (j)j^{s,t) which weakly converges (in the weak* topology to the space) 

to the WAP unbiased test. In particular, the WAP unbiased test is admissible. 

Comments: 1. The weak convergence guarantees, for example, that the limiting 
power function of (j)j^{s,t) is the power function of the WAP unbiased test. See 


Moreira and Moreira (2013) for details on weak convergence of tests. 

2. The theorem assumes the parameter space is compact. It may be possible 


to drop this assumption with some additional technical conditions; see [Lehmann 
(1952). The compactness assumption, however, may not be overly restrictive in 


practice. First, one could argue that we can pin down a region large enough in which 
the parameter lies. Second, the usual mathematical and statistical software packages 
have limited numerical accuracy, so for all practical purposes the weight A in the 
average density h\ {s,t) has support in a compact set. 

Proposition!^ shows that there is no sign group structure which preserves the null 
and alternative. This makes the task of finding a weight function h\ [s, t) which yields 


Olea (20151 provides an alternative proof that similar tests are admissible by contradiction. 
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a WAP unbiased test difficult with HAC errors. Instead of seeking a weight function 
A so that the WAP test is approximately unbiased, we can select an arbitrary weight 
and hnd the optimal test among unbiased tests; see Moreira and Moreira (2013). 
In practice, it would be computationally intensive to handle so many constraints of 
the form {S,T) > [S,T) for any scalar [3 and fc-dimensional vectors 

fi and Pq, especially when k is large. Instead we choose two different restrictions. 
The hrst condition is based on the local power around the null hypothesis. It is a 
weaker condition than unbiasedness, so it does not rule out admissibility. The second 
condition is a stronger requirement but is easier to implement. Better yet, numerical 
simulations will show it yields little power reduction compared to the hrst condition. 
Both conditions and their associated WAP tests are presented next. 


4.1 Locally Unbiased (LU) Condition 

If the test is unbiased, the derivative of the power function must be equal to zero 
under the null. The next proposition uses this fact and completeness of T to provide 
a necessary condition for a test to be unbiased. This locally unbiased (LU) condition 
states that the test must be similar and uncorrelated with linear combinations (which 
depend on the instruments’ coefficient /r) of the pivotal statistic S. 

Proposition 3. A test is said to be locally unbiased (LU) if 

Ep^^^4> {S, T) = a and Ep^^^cf {S, T) S'Cp^yi = 0, V/i. (LU) 

If a test is unbiased, then it is LU. 


In the case k = 1 where the model is exactly identihed, we have an optimality 
result for any choice of A. The Anderson-Rubin test is the uniformly most power¬ 
ful unbiased (UMPU) test and has power function depending on the noncentrality 


parameter (/3 — (iof We can prove this result directly from Theorem 2-(a) of 


Moreira (2001, 2009) for homoskedastic errors (with the scalar /r and matrix 12 being 


replaced by and S). As this setup resembles the just-identihed model with ho¬ 
moskedastic errors, optimality of the Anderson-Rubin test for HAC errors and k = 1 
follows straightforwardly. 


Proposition 4. If k = 1, the Anderson-Rubin test is the uniformly most powerful 
unbiased test and has a power function given by 

where G (■; 5^) is the noncentral (1) distribution function with noncentrality param¬ 
eter (5^. Furthermore, the LM and CQLR tests are equivalent to the Anderson-Rubin 
test, and are also optimal. 
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Following Proposition the WAP-LU test solves 

(S', T), where (S', T) = a and (S', T) S'C^^n = 0, V/i. (4.24) 

The optimal tests based on hi (s, t) and h 2 (s, t) are denoted respectively MMl-LU 
and MM2-LU tests. In the just-identihed model, the MMl-LU test is shown to be 
the uniformly most powerful unbiased test. The MM2-LU test is equivalent to the 
MM2 similar test and is also optimal. 


Proposition 5. The following hold when k = 1: 

(a) The MM2-L U and MM2 similar tests are equivalent and uniformly most powerful 
unbiased tests. 

(b) Both MMl-LU and MM2-LU tests are uniformly most powerful unbiased tests. 

Comments: 1. The MM2 similar test automatically satishes the LU condition 
when k = 1. Hence, the MM2-LU and MM2 similar tests are equivalent when the 
model is exactly identihed. 

2. The MMl similar test is not locally unbiased even when k = 1. Close inspection 
of the weighted density hi (s,f) shows that dg/cy is the relative contribution of the 
one-sided S ■ T statistic to the AR = 5^ statistic. If S is close to being singular (that 
is, |S| is near zero), the ratio dg/cg can diverge to inhnity. The MMl test can then 
behave as a one-sided test. We will illustrate this problem numerically in Section]^ 


In the case k > 1 where the model is overidentihed, we no longer have a uniformly 
most powerful unbiased test. However, we can still hnd WAP tests which are locally 
unbiased. Relaxing both constraints in (4.24) assures us the existence of Lagrange 
multipliers; see Moreira and Moreira (2013). Therefore, we solve the approximated 
maximization problem: 


max E/i(h (S', T), where a — e 
o<<P<i 


and Ey^^f,^(j){S,T) 


< (S, T) < a + e, Vp 

= 0, for I = 1,..., m. 


(4.25) 


when e is small and the number of discretizations m is large. The optimal test rejects 
the null hypothesis when 

m 

hk (S, t) - s'Cy^ ^ (S) t) > 

l=l 


fyo,vis,t) dA, (/i). 


(4.26) 


where the measure Ag and the scalars c^, I = l,...,m, are multipliers associated to 
boundary constraints in the maximization problem (4.25). 

We can use fg^^^ (s, t) = (s) x (f) to write (|4.26[) as 


hiv (s,t) 

fi (^) 


m 


(t) > 


l=l 




(4.27) 
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Letting e 0, the optimal test rejects the null hypothesis when 


hA{s,t) 

fi M 


1=1 


(4.28) 


where k (t) is the conditional 1 — a quantile of 


hA{S,t) 

fi (S) 


1=1 


This representation is very convenient as we can hnd 


K, (t) = lim 



(t) 


(4.29) 


(4.30) 


by numerical approximations of the conditional distribution instead of searching for 
an inhnite-dimensional multiplier A^. We then search for the values q so that 

(S, T) = y ^ (s, () s'C^„t0di (s) (() = 0, (4.31) 

by taking into consideration that k (f) depends on q, I = We can hnd q, 

I = 1, ...,m with a nonlinear numerical algorithmic 

As an alternative procedure, we consider a condition stronger than the LU condi¬ 
tion which is simpler to implement numerically. This strategy turns out to be useful 
because it provides a simple way to implement tests with overall good power. We 
explain this alternate condition next. 


4.2 Strongly Unbiased (SU) Condition 

The LU condition asserts that the test (j) is uncorrelated with a linear combination 
indexed by the instruments’ coefficients /i and the pivotal statistic S. We note that 
the LU condition trivially holds if 


(A, T) = a and (A, T)S = 0, V/x. (SU) 

That is, the test 0 is uncorrelated with the fc-dimensional statistic S itself under the 
null. This strongly unbiased (SU) condition states that the test 0 (S', T) is uncor¬ 
related with S for all instruments’ coefficients /x. The WAP-SU test based on the 
weight A solves 

max Ea4> (S', T), where (S', T) = a and Ep^^^cp (S', T) S' = 0, V/x. (4.32) 


"‘The two-step procedure just described is the usual substitution method for a system of equations, 
but here we have an uncountable number of equations and unknowns. 
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The optimal tests based on hi (s, t) and ^2 (s, t) are denoted respectively MMl-SU 
and MM2-SU tests. 

When k = 1, the LU and SU conditions are equivalent (hence, the MMl-SU and 
MM2-SU tests are uniformly most powerful unbiased). When fc > 1, the following 
lemma proves the LU condition is strictly weaker than the SU condition. Hence, 
hnding WAP similar tests that satisfy the SU instead of the LU condition in theory 
may entail unnecessary power losses. In practice, numerical simulations in Section 
indicate that there is little power gain -if any- by using the LU instead of the SU 
condition (with the MMl-SU and MM2-SU tests having the advantage of being easier 
to implement). 


Lemma 1. Define the integral 

hi) = (s, t) s'Cg^hi = j t) s'Cp^hi-fg, («) ^ • 

For k > 1, there exists a test function : [S', T] —)■ [0,1] such that F^{^i,fii) = 0 for 
all hi! o,nd F^{fii,^ 2 ) 7^ 0? some hi cind hi- 


Because the statistic T is complete, we can carry on power maximization in (4.32) 
for each level of T = t\ 


max Exfi (5, t) , where (S', t) = a and Ep^fi (S, t) S = 0, (4.33) 


where the expectation is taken with respect to S only. The WAP-SU test rejects the 
null when 


hA (g,t) 

/|, (s) ■ hi (t) 


> K(s,t), 


where the function n (s, t) = Kq (t) + s'Ki (t) is such that the optimal test satisfies the 
SU condition. The term h-l (t) can be absorbed in the critical value function. For 
numerical stability, however, we recommend keeping it so that the numerator and 
denominator are of the same order of magnitude. 

In practice, we can find kq (t) and Ki (t) using linear programming based on sim¬ 
ulations for the statistic S. Consider the approximated problem 


max 

0<a;O')<l 


S.t. 


J ^ ^ ^ ^ exp { 2 pi)’^F 

j=l ^A (U 

J 

J~^ ^ = a and 

J 

J~^ ^ = 0, for I = 1,..., k. 
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Each j-th draw of S is iid standard-normal: 


= 


^ 1 - 


(i) 


L S, 


U) 


iV(o,4 


We note that for the linear programming, the only term which depends on T = t 
is /lA The mnltipliers for this linear programming problem are the 

critical valne fnnctions kq {t) and ki {t). To speed np the nnmerical algorithm, we 
can use the same sample j = 1,..., J, for every level T = t. 


Finally, we use the WAP test found in (4.33) to hnd a useful two-sided power 
envelope. The next proposition Ends the optimal test for any given alternative which 
satisfies the SU condition. 


Proposition 6. The optimal SU test for a point alternative (/3,p) rejects the null 
hypothesis when 

2 

- > c(l). (4.34) 

This test is denoted the Point Optimal Strongly Unbiased (POSU) test and has power 
given by 

> c(l)j = 1 - G (c(l); (/3 - I3af , 

where is the noncentral (1) distribution function with noncentrality pa¬ 

rameter 5^. 

Comments: 1. The POSU test does not depend on ft but does depend on the 
direction of the vector Cg^p,. 

2. When k = 1, the Anderson-Rubin and POSU tests are the same. 

The power plot of 1 — G ^c(l); {ft — jd^ff as ft and p change yields the 

two-sided power envelope. This power envelope is the two-sided analogue of the one¬ 
sided power envelope among similar tests. This power upper bound, based on the 
Point Optimal Similar (POS) test for the alternative {ft,p), is given by the plot of 

1 - $ -\/3 -(3q\ ^pCj^p ^, where $ (■) is the standard normal distribution. 

5 Numerical Evaluation of WAP Tests 

In this section, we provide numerical simulations for WAP tests based on the MM 
statistics. The MM tests are WAP similar tests based on hi {s,t) and ^2 (s,t). The 
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MM-LU and MM-SU tests also satisfy respectively the locally unbiased and strongly 
unbiased conditions. The goal in this section is to numerically illustrate the impor¬ 
tance of using two-sided conditions to obtain tests with overall good power. 

We can write 


r 1/2 

UJi[ 

0 

p 

■ i + p 

0 

P' 

r 1/2 

^11 

0 

0 

.1/2 

^22 


0 

1 -p 

Q 

0 

.1/2 

^22 


where Pq is an orthogonal matrix and p = ■ For the numerical simula¬ 

tions, we specify uj\i = UJ 22 = 1- 

We use the decomposition of to perform numerical simulations for a class of 
covariance matrices: 




l + p 0 
0 0 


P/j (g) diag (<^i) -h Pn 


0 0 
0 l-p 


Pq ® diag (^ 2 ) , 


where and ^2 are fc-dimensional vectors. 

We consider two possible choices for <ji and ? 2 - For the hrst design, we set <^i = 

<^2 = (1/s — 1,1,..., 1)^ The covariance matrix then simplihes to a Kronecker product: 

S = 0 diag(gi). For the non-Kronecker design, we set <^i = (1/s — 1,1,...,!) 

and <^2 = (1, •••, 1,1/s — l)^ This setup captures the data asymmetry in extracting 
information about the parameter /d from each instrument. For small s, the angle 
between and ^2 is nearly 90°. We report numerical simulations for e = (/c -|- 1)”^. 

As k increases, the vector becomes orthogonal to <^2 in the non-Kronecker design. 

We set the parameter p = -y/fc j for k = 2,5, 10,20 and p = —0.5, 0.2, 0.5,0.9. 

We choose X/k = 0.5,1, 2,4, 8,16, which span the range from weak to strong instru¬ 
ments. We focus on tests with signihcance level 5% for testing /3q = 0. To conserve 
space, we report here only power plots for fc = 5, p = 0.9, and X/k = 2,8. The full 
set of simulations is available on Marcelo Moreira’s website. 

We present plots for the power envelope and power functions against various alter¬ 
native values of /d and A. All results reported here are based on 1,000 Monte Carlo sim¬ 
ulations. We plot power as a function of the rescaled alternative ((d — (d^) X^P , which 
reflects the difficulty in making inference on fd for different instruments’ strength. 

Figure reports numerical results for the Kronecker product design. All four 
pictures present the power envelope and power curves for two existing tests, the 
Anderson-Rubin (AR) and score (LM) tests. 

The hrst two graphs plot the power curves for the three WAP tests based on the 
MMl statistic with = 10. All three tests reject the null when the hi (s,t) statistic 
is larger than an adjusted critical value function. In practice, we approximate these 
critical value functions with 10,000 replications. The MMl test sets the critical value 
function to be the 95% empirical quantile of hi(S,t). The MMl-SU test uses a 
conditional linear programming algorithm to hnd its critical value function. The 
MMl-LU test uses a nonlinear optimization package. 
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Figure 1: Power Comparison (Kronecker Variance) 



The AR test has power considerably lower than the power envelope when instru¬ 
ments are both weak {\/k = 2) and strong {\/k = 8). The LM test does not perform 
well when instruments are weak, and its power function is not monotonic even when 
instruments are strong. These two facts about the AR and LM tests are well doc¬ 
umented in the literature; see Moreira (2003) and AMS06. The hgure also reveals 
some salient hndings for the tests based on the MMl statistic. First, all MMl-based 
tests have correct size. Second, the MMl similar test can have large bias to the point 
that it has zero power for parts of the parameter space. Hence, a naive choice for 
the density can yield a WAP test which can have overall poor power. We can elimi¬ 
nate this problem by imposing an unbiased condition when selecting an optimal test. 
The MMl-SU test is easy to implement and has power closer to the power upper 
bound. When instruments are weak, its power lies moderately below the reported 
power envelope. This is expected as the number of parameters is too larg^ When 
instruments are strong, its power is virtually the same as the power envelope. 

To support the use of the MMl-SU test we also consider the MMl-LU test, which 
imposes a weaker unbiased condition. Close inspection of the graphs show that the 
derivative of the power function of the MMl test is different from zero at /3 = /3 q. This 


^The MMl-SU power is nevertheless close to the two-sided power envelope for orthogonally 
invariant tests as in AMS06 (which is applicable to this design, but not reported here). 
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observation suggests that the power curve of the WAP test would change considerably 
if we were to force the power derivative to be zero at /3 = Indeed, we implement 
the MMl-LU test where the locally unbiased condition is true at only one point, the 
true parameter /r. This parameter is of course unknown to the researcher and this 
test is not feasible. However, by considering the locally unbiased condition for other 
values of the instruments’ coefficients, the WAP test would be smaller —not larger. 
The power curves of MMl-LU and MMl-SU tests are very close, which shows that 
there is not much to be gained by relaxing the strongly unbiased condition. 

The last two graphs plot the power curves for the three WAP tests based on the 
MM2 statistic with ( = 10. By using the density h 2 {s,t), we avoid the pitfalls for 
the MMl test. Recall that ^2 {s,t) is invariant to those data transformations which 
preserve the two-sided hypothesis testing problem. Hence, the MM2 similar test 
is unbiased and has overall good power without imposing any additional unbiased 
conditions. The graphs illustrate this theoretical hnding, as the MM2, MM2-SU, and 
MM2-LU tests have numerically the same power curves. This conclusion changes 
dramatically when the covariance matrix is no longer a Kronecker product. 

Figure 2: Power Comparison (Non-Kronecker Variance) 


X/k = 2 





X/k = 8 



X/k = 2 



X/k = 8 



Figure presents the power curves for all reported tests for the non-Kronecker 
design. Both MMl and MM2 tests are severely biased and have overall bad power. 
For each design, we can make the tests approximately unbiased by choosing the 
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and ( parameters large enough. However, this unbiasedness control is pointwise in the 
parameter space. We can always hnd a design such that each test behaves as a one¬ 
sided test and has very low power in parts of the parameter space. Hence, the strong 
asymptotic bias and often-low power of the conditional Wald tests found by [Andrews, 


[Moreira, and Sto^ ( 2007[ ) also hold for the MMl (even for the homoskedastic IV 
model) and MM2 similar tests (only for the HAC-IV model). These WAP similar 
tests are highly biased with power equal to zero in some parts of the parameter 
space. Therefore, just as Andrews, Moreira, and Stock (2007) object to the use of 
conditional Wald tests, we do not recommend the MMl and MM2 similar tests for 
empirical researchers. 

Proposition shows that we cannot hnd a group of data transformations which 
preserve the two-sided testing problem with heteroskedastic-autocorrelated errors. 
Hence, a choice for the density for the WAP test based on symmetry considerations 
is not obvious. The correct density choice can be particularly difficult due to the 
large parameter-dimension (the coefficients /i and covariance S). Instead, we can 
endogenize the weight choice so that the WAP test will be automatically unbiased. 
This is done by the MMl-LU and MM2-LU tests. These two tests perform as well as 
the MMl-SU and MM2-SU tests. Because the latter two tests are easy to implement, 
we recommend their use in empirical practice. 


6 Asymptotic Theory 

All theoretical and numerical results so far do not rely on the sample size n at all 
as we have assumed the statistics S and T to be exactly normally distributed with 
known variance S. In this section we relax this assumption at the cost of asymptotic 
approximations. 

Let Zi and vt denote the Ath row of Z and V, respectively, written as column 
vectors of dimensions k and 2. We make the following two assumptions as the sample 
size n grows. 

Assumption 1. n~^Z'Z = ~^v for some positive dehnite k x k 

matrix Dz- 

Assumption 2. ~^d A^(0, Sqo) for some positive dehnite 2k x 2k 

matrix Soo- 

Assumption 1 holds under Birkhoff’s Ergodic Theorem. Assumption 2 holds under 
suitable conditions by a central limit theorem (CLT). It also assumes that the long- 
run covariance matrix of Sqo is positive dehnite, as is usual in the literature. We 
no longer omit the dependence of S on the sample size n and, hereinafter, write 
Assumption 2 asserts that Eoo is the limit of as n grows. Let be a 
consistent estimator of Eoo based on {(hj ® Zi) ■. i < n}, where ry are reduced-form 
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residuals. There are many HAC estimators in the literature that can be used for 


this purpose; see, e.g., Newey and West (1987) and Andrews (1991). For brevity, we 


do not provide an explicit set of conditions under which one or more of these HAC 


estimators is consistent; see Jansson (2002) for details. We note, however, that the 


presence of weak instruments does not complicate standard proofs of the consistency 
of HAC estimators. Indeed, the convergence for most estimators holds uniformly over 
all true parameters (3 and vr. 

We now introduce feasible versions of Sn and with the variance replaced by 
the estimator 


Sn = 


(fo'o 0 Ik) {bo ® Ik) 


T = 

-L rt. 


- 1/2 

{h'o ® h) 

1 - 1/2 


Rr, and 


(6.35) 


where = vec 
'il;{S,T,E,Dz) wi 


(a;, ® Ik) (ao ® Ik) {a'o ® h) IIn R 


{Z'Z) Z'Y . Likewise, we define the feasible statistic ///„ as 

L the arguments being replaced by their sample analogues: 

Tn, Sn, Rz), wliere Dz = n~^Z'Z. (6.36) 


Assumption 3. The prior distribution for {(3, vr) is absolutely continuous to the 
Lebesgue measure in Its density 

w{f3,71, Dz) = tci(7r| /3, Dz) ■ W2{/3, Dz) 


has full support and is a continuous function of tt and (3. 


Assumption 3 allows the density w{l3, tt, Dz) to depend on the data through Dz- 
This generalization allows us to cover all tests considered here and asymptotically 
behaves as w{l3, tt, Dz) (and so we will omit the dependence of the weights on Dz out 
of convenience). Although the conditional density tci(7r| (3) does not depend on (3 for 
the MMl tests, it does depend on (3 for the MM2 tests. Assumption 3 also guarantees 
that the priors for f3 and tt are not dogmatic and will vanish asymptotically as in the 
Bernstein-von Mises theorem. If we set the prior on /r, then the associated prior 
on TT = {Z'Z)^R ^ depends on the sample size. For example, the MM statistics 
) use the prior /i ~ A(0,cr^<h). For the associated prior on tt ~ 

N ^0, {a^/n) D'^R^D~^R^ not to be sensitive to the sample size, the parameters 
and C present in the MMl and MM2 statistics must eventually grow at the rate n. 
We make the dependence of A {(3, n) on the sample size n explicit and, hereinafter, 
use the notation A„. 

We now analyze the asymptotic behavior of the WAP similar and WAP-SU tests. 
Recall that both of these types of tests depend on the test statistic 


introduced in (2.10 


hkn {s, t) 

/|o (^) ■ (^)' 


(6.37) 
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When instruments are weak, the numerator and denominator have the same order of 
magnitude. When instruments are strong, the integrands in the weighted densities 
(s, t) and (t) grow exponentially fast and we can apply the Laplace approx¬ 


imation. Because both densities involve k + 1 integrals, the test statistic in (6.37) 


is again well-behaved. The caveat is that a simple, closed-form approximation for 
(^) does not seem available under strong instruments. The WAP similar and 


WAP-SU tests, however, remain the same if we standardize (6.37) by any function of 
t. We replace hl^ (t) by (1 + ||f||)“^ (f), where 


(t) = I (^) ^ (/^o,^) dvr. 


The WAP similar and WAP-SU tests reject the null when 

hA„ {S,T) 


WAP = 




(6.38) 


(6.39) 


is larger than Kn {t) and {s,t), respectiveljj^ 

Whether the instruments are weak or strong, we are able to obtain an approxi¬ 


mation to (6.39). Dehne 


n-Qn{^,7r) = - 


(fl- (a® (Z'Z)'''^ir)) 


|S : T] - [(/) - : Dp] (h ® {Z'Zf\) 


In Appendix B shows that the WAP statistic is asymptotically equivalent to 

- 1/2 


/ exp (-n ■ Qn (/?, TT (/?))) w {/3, tt (/?)) 


a' ® D 


‘'A Ey (a® By 




exp(-^) [1 +||T||] ^w(/3o,7r(/?o)) 
where the constrained maximum likelihood estimator (MLE) for vr is 
7r(/?) = (Z'Z)-'/'[(a'®4)S;'(a®4)]"'(a'®4)S;i:Rand 
[(6' ®4)S„ {ho®h)V'^ (fe'c 


o'o O 


"-1 


ao 0 


- 1/2 


(6.40) 


(6.41) 


R = T}J^ 


AS 


1/2 


[(a'o ® A) S-i (ao ® A)]”'^' (a(, 0 A) S^'^' 


- / 

■ ^ ■ 


T 


The same approximation (6.40) holds for the WAP statistic where we replace S', 


T, and S by their feasible versions given in (6.35). The resulting approximation to the 


®The use of a Laplace approximation of the ratio of weighted average under the alternative and 
the null is standard under the usual asymptotics. What is perhaps not standard is the additional 
term to absorb different rates and unify nonstandard asymp totics . Indeed, if we were to replace 
hj (<) only by /iaq „ (^)’ numerator and denominator in (6.37) would have different orders of 
magnitude under strong instruments. 
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WAP statistic is a function of S'„, T„, and Dz- The critical values for the WAP 
conditional tests and WAP-SU tests, respectively (t) and Kn (s, t), are taken under 
the assumption that the fc-dimensional vector S'„ has a standard normal distribution 
(in practice, these critical values are also functions of the consistent estimators 
and Dz as well, but we omit this dependence out of convenience). For example, for a 
given weight density w {(3, tt), the critical function (t) is simply the 1 — a quantile 


of (6.40) given T = t. 


We now hnd the asymptotic distribution for the WAP tests under the WIV asymp¬ 
totics. We make the following assumption. 

Assumption WIV-FA. (a) tt = C for some non-stochastic vector C. 

(b) /3 is a hxed constant for all u > 1. 

(c) fc is a hxed positive integer that does not depend on n. 


Under WIV, vr {(3) is Op (1) and the WAP statistics behave the same as if the 
weights were simply w(/3,0). As u —>■ cx), the hnite-sample critical value functions 
Kn (t) and Kn (s, t) respectively converge to their asymptotic counterparts Koo (t) and 


^oo (s, t), which are based on (6.40) with w {/3, tt {/3)) replaced by w {/3, 0). We then 
obtain the following convergence by the continuous mapping theorem and the joint 
distribution 


■ 5oo ■ 

~ A I 

{(3 - 13q) 


V 

IIi3q,oo 


{DzY^'^ C,l 2 k] , where 


(6.42) 


= [(6'o®4)Sc 


{bo ® Ik)] and 
1 - 1/2 


Dpo,oo = [(a'o ® Ik) (ao «) 4 )] (og O 4 ) (a O 4 


Theorem 2. Under Assumptions WIV-FA and 1-3.' 

(i) (SnX) -^d (^oo,Too); 


{ii) P{WAP{Sn,Tj >kin[Tn 


-)■ P {WAP {Soo,T^) > Koo (Too)) ; and 


{in) P (wap ( Sr.,Tj >Kr.{Srr,Tr.))^P{WAP{S^,T^)>K^ {Soo,T^)) ■ 


Both WAP conditional and WAP-SU tests have asymptotic null rejection proba¬ 
bilities being equal to a. The asymptotic power of the WAP tests has a complicated 
form under WIV asymptotics. We can, of course, rely on numerical simulations to 
compare their performance with other available tests. In Section we present power 
plots for testing the intertemporal elasticity of substitution based on the designs of 


Yogo (2004). 


For strong instruments with local alternatives (SIV-LA), we consider the Pitman 
drift where f3 is local to the null value (Iq as n —>■ oo. 
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Assumption SIV-LA. (a) (3 = (3q-\- B for some constant i? G M. 

(b) TT is a fixed non-zero /c-vector for all n > 1. 

(c) fc is a fixed positive integer that does not depend on n. 

Under the SIV-LA asymptotics, the WAP statistics are shown to be increasing 
transformations of the LR statistic. This resnlt is general and holds for any prior 
which satisfies Assnmption 3. 


Theorem 3. Suppose Assumptions SIV-LA and 1-3 hold. 

Soo is known, or unknown but consistently estimable by 
and WAP-SU tests are asymptotically equivalent to the LR test given in (2.6). 


The long-run variance 
Then the WAP similar 


Comment. 1. In the proof, we apply the Laplace approximation twice, first 
with respect to the integral for vr and then for (3. For the MMl and MM2 statistics, 
we can alternatively find a simple expression after integrating ont the prior for the 
instrnments’ coefficients with or C, growing at rate n and then applying the Laplace 
approximation for (3. Both approaches coincide. 

2. The SIV-LA behavior of the ECS (HAC-IV) test appears to be jnst a special 
case of onr theory nsing Laplace approximations. 

3. For higher-order expansions, we can nse Watson’s lemma; for references, we 


recommend Olver (1997) for deterministic fnnctions and Onatski, Moreira, and Hallin 


(2014a, 2014b) for random fnnctions. 


4. Becanse —)-p Dg^D^J'^Tr nnder SIV-LA, ||T„|| diverges to infinity w.p.l 

(with probability approaching one). The critical valne fnnctions for both the WAP 
conditional and WAP-SU tests collapse then to the 1 — a asymptotic (nnconditional) 
qnantile. As a resnlt, the WAP conditional and WAP-SU tests are asymptotically 
similar and efficient nnder the SIV asymptotics. 

The nnll rejection probability of WAP tests is a nnder WIV and SIV asymp¬ 
totics. Pointwise convergence of the nnll rejection probability, of conrse, does not 

P- 


necessarily imply the size is asymptotically a (in a nniform sense). Moreira (2003 


1037) snggests to nse Parzen (1954) and Andrews (1986) to assure size is uniformly 


controlled. A series of papers, including Andrews, Cheng, and Guggenberger (2011) 


and Andrews and Guggenberger (2014a), develop several powerful methods to check 


uniform size control and have been applied to many econometric models; see Andrews 


and Guggenberger (2010), Andrews and Guggenberger (2014a), and Mills, Moreira, 


and Vilela (2014b), among others. Gonceivably, we can apply those methods to the 
WAP statistics coupled with the critical value functions (t) and Kn (s, t). This line 
of research will be considered in a separate paper. 

We can also analyze the WAP tests under strong instruments with hxed alterna¬ 


tives (SIV-FA). We follow Mills, Moreira, and Vilela (2014a) and make the following 
assumption. 
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Assumption SIV-FA. (a) /3 = /3g + -B for some nonzero i? G M. 

(b) TT is a fixed non-zero /c-vector for all n > 1 . 

(c) fc is a fixed positive integer that does not depend on n. 

It is natural to expect that the power converges to one if the parameter (3 is fixed. 
However, not all tests have this property even in the IV model with homoskedastic 


Andrews, Moreira, and Stock 

(2004 

) and 

Mills, Moreira, and Vilela 

(2014a) 


for examples. Hence, it is important to establish consistency for the WAP tests. 

If the parameter (3 is fixed, the WAP statistics are proportional to the exponential 
of LR. Because LR/n converges to a non-zero constant, the WAP tests are consistent. 
The next theorem formalizes this result. 


Theorem 4. Suppose Assumptions SIV-FA and 1-3 hold. The long-run variance Ego 
is known, or unknown but consistently estimable by Then the following hold: 

(i) 2. ^loghPApj fn = LR/n + Op (1); and 

(ii) LR/n = LR/n -|- Op (1) —)• 7 > 0. 

Comment: If Dg 7 ^ 0, the functions (t) and Kn {s,t) converge to a constant 
obtained under SIV-FA. If Dg = 0, the critical functions do not converge. However, 
they are bounded, and so WAP tests are consistent. 


7 Power Comparison 


In this section, we follow 1 . 1 Andrews] (|2015 ) who calibrates designs for power compar¬ 
ison based on the work of Yogo (2004^ on the elasticity of intertemporal substitution 


in eleven developed countries. 


Yogo (2004) tests the effect of interest rates on the level of aggregate demand 
in an IV model. He considers a linear regression in which asset return affects con¬ 
sumption growth, and the reverse form of this regression. In both equations, the 
endogenous variable (consumption or asset return) can be correlated with the error 
(innovation). To remedy this problem, he chooses four instruments: lagged values of 
nominal interest rate, inflation, consumption growth, and log dividend-price ratio. 


I. Andrews (2015) selects the real interest rate (r/ in Yogo 's (2004) notation) as the 
endogenous variable. Several tests perform well in his design, including MM2-SU, PI- 
CLC, and (WAP similar) ECS tests. In fact, only in a few countries do these tests have 


slightly different performance; see Section 7.2.1 of I. Andrews (2015). The difficulty in 


assessing the relative performance of each test arises because the instruments are not 
particularly weak in this design. Indeed, the hrst-stage F-statistic reported by |Yogo 
(2004) (see his Table I) is below 10 in only four countries (Japan, Switzerland, United 


Kingdom, and the United States) 
real stock return 


We instead join de Castro (2015) in choosing the 
re in Yogo's (2004) notation) as the endogenous variable. The 
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instruments are considerably weaker in this design: the F-statistic is smaller than 4.18 
in all countries, and always less than the F-statistic for interest rate. Our decision to 
use stock returns aims to highlight the differences between the tests proposed for the 
HAC-IV model. Apart from using stock returns instead of interest rates, our design 
is akin to that of I. Andrews (2015). We use the Newey-West estimator with three 
lags, and the resulting power curves are based on 5,000 Monte Carlo simulations. In 
parallel to our asymptotic theory, we choose the ratio of the tuning parameters and 
C to the sample size to be one-tenth for the MMl and MM2 statistics, respectively. 

Figure 3 plots power curves for the two-sided power envelope, Anderson-Rubin 
(AR), score (LM), WAP similar MMl, WAP similar MM2, and ECS (HAC-IV) tests. 
Although the AR and LM tests are unbiased, the MMl, MM2, and ECS tests perform 
unreliably. To illustrate the problem, we mention three countries. For Australia, the 
MMl and ECS tests have low power for parts of the parameter space, while the MM2 
test behaves more like a two-sided test. For France, the ECS test performs well, while 
both MMl and MM2 tests can have low power. For the USA, the ECS test has power 
near zero and behaves more as a one-sided test while the MMl and MM2 tests are 
nearly unbiased. In some countries, these three tests have power even lower than the 
Anderson-Rubin test (e.g., the ECS test for Germany and Italy). 
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Figure 3: Power Comparison (WAP similar tests) 
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We then compare power among two-sided tests which have arguably better per¬ 
formance. Figure 4 plots power curves for the two-sided power envelope, MMl-SU, 
MM2-SU, CQLR, CQLR-kron, and PI-CLC tests. All tests are adequate for two-sided 
hypothesis testing. The PI-CLC and CQLR-kron test show some improvements over 
the CQLR test for some, but not all, countries. The MMl-SU test behaves near the 
MM2-SU test for several countries, but it has considerably lower power for Japan 
and the United State^ The MM2-SU test outperforms these tests and when it occa- 

^Conceivably, this power loss can be due to numerical integration over the whole real line. Power 
may be improved by transforming the parameter (3 to the quantity 0 = tan“^ {dp/cp). This im¬ 
provement is left for future work. 


28 

































sionally has less power, the power loss is small. This application based on real data 
supports our theoretical contribution and the use of the MM2-SU test in practice. 

Figure 3: Power Comparison (two-sided tests) 
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8 Concluding Remarks 

In this paper, we study the instrumental variable (IV) model with one endogenous 
regressor and heteroskedastic and autocorrelated (HAC) errors. The HAC-IV model 
with a known variance matrix is simpler than the model with an unknown but con¬ 
sistently estimable long-run variance. However, inference in both models is approx¬ 
imately the same whether or not the instruments are weakly correlated with the 
endogenous variable. This simplihcation allows us to develop a theory of optimal 
two-sided tests when the error stochastic process is of unknown form. 
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We find that a test that has correct size and is optimal under standard asymp¬ 
totics may still have unacceptably low power in hnite samples. This issue appears 
in several econometric models. For the HAC-IV model, we solve this problem by 
hnding weighted-average power tests satisfying additional two-sided conditions. In 
this paper, we consider two possibilities: the locally unbiased (LU) and strongly unbi¬ 
ased (SU) conditions. While the local condition yields admissible tests, the stronger 
condition is easier to implement. Better yet, the MMl-SU and MM2-SU tests have 
power numerically very close to their LU versions. Numerical simulations also show 
that the MM2-SU test outperforms other tests proposed for the HAC-IV model. 

The only other paper that satisfactorily addresses optimality of two-sided tests in 
the HAC-IV model is that of I. Andrews (2015). He explores linear combinations of 
the Anderson-Rubin and score statistics, with weights dependent on the conditioning 
statistic T. A class of these conditional linear combination (CLC) tests is unbiased 
and admissible in the conditional problem. By proposing a minimax regret criterion, 
he delivers a test which plugs in a nuisance-parameter estimator. There is some 
power gained by broadening the focus beyond those three statistics. On the other 
hand, we impose k additional constraints which are related to the SU condition. It 
would be interesting to reduce the required computational time while maintaining 
the power gains of the MM2-SU test by reducing the number of boundary conditions 
when hnding a WAP test. 

Finally, the asymptotic theory based on Laplace approximations, developed in 
this paper, is easily adaptable to other econometric models. For the HAC-IV model, 
it relies on priors for the parameters and tt being insensitive to the sample size. 
For the MMl and MM2 weights, this implies that the tuning parameters and ( 
(used in the prior for /i = eventually grow at the sample size n. Some 

power gains with weak instruments may be possible when the tuning parameters are 
held constant. Another alternative is to hnd an automatic rate for and ( using 
a plug-in method. For example, we could let these parameters be proportional to 
either ||T||^ or n ■ ||7r (/3o)||^- These quantities are stochastically bounded under weak 
instruments and grow at the rate n under strong instruments (which assures asymp¬ 
totic optimality). Since the constrained MLE vr (/3 q) is a one-to-one transformation 
of T, these modifications of WAP-SU tests are still similar and uncorrelated with the 
pivotal statistic S (hence, satisfy the SU Condition)]^ We will consider this possibility 
in future work. 
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